Systems and methods for automatically detecting and repairing slot errors in machine learning training data for a machine learning-based dialogue system

ABSTRACT

Systems and methods for automatically detecting annotation discrepancies in annotated training data samples and repairing the annotated training data samples for a machine learning-based automated dialogue system include evaluating a corpus of a plurality of distinct training data samples; identifying one or more of a slot span defect and a slot label defect of a target annotated slot span of a target training data sample of the corpus based on the evaluation; and automatically correcting one or more annotations of the target annotated slot span based on the identified one or more of the slot span defect and the slot label defect.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/895,147, filed 8 Jun. 2020, which claims the benefit of U.S.Provisional Application No. 62/864,705, filed 21 Jun. 2019, U.S.Provisional Application No. 62/978,933, filed 20 Feb. 2020, U.S.Provisional No. 63/000,000, filed, 26 Mar. 2020, which are incorporatedherein in their entireties by this reference.

GOVERNMENT RIGHTS

The subject matter of the invention may be subject to U.S. GovernmentRights under National Science Foundation grants: NSF SBIR Phase 1Grant—1622049 and NSF SBIR Phase 2 Grant—1738441.

TECHNICAL FIELD

The inventions herein relate generally to the machine learning andartificially intelligent dialogue systems fields, and more specificallyto new and useful systems and methods for intelligently implementingmachine learning models of a machine learning-based conversationalservice in the machine learning field.

BACKGROUND

Modern virtual assistants and/or online chatbots may typically beemployed to perform various tasks or services based on an interactionwith a user. Typically, a user interacting with a virtual assistant maypose a question or otherwise submit a command to the virtual assistantto which the virtual assistant may provide a response or a result. Manyof these virtual assistants may be implemented using a rules-basedapproach, which typically requires coding or preprogramming many orhundreds of rules that may govern a manner in which the virtualassistant should operate to respond to a given query or command from auser.

While the rules-based approach for implementing a virtual assistant maybe useful for addressing pointed or specific queries or commands made bya user, the rigid or finite nature of this approach severely limits acapability of a virtual assistant to address queries or commands from auser that exceed the scope of the finite realm of pointed and/orspecific queries or commands that are addressable by the finite set ofrules that drive the response operations of the virtual assistant.

That is, the modern virtual assistants implemented via a rules-basedapproach for generating responses to users may not fully satisfy queriesand commands posed by a user for which there are no predetermined rulesto provide a meaningful response or result to the user.

Additionally, while machine learning enhances capabilities ofartificially intelligent conversational systems, inefficiencies continueto persist in training the underlying machine learning models performingclassification and predictive functions of the artificially intelligentconversation systems.

Therefore, there is a need in the machine learning field for systems andmethods that enable rapid and efficient training of machine learningmodels and for a flexible virtual assistant solution that is capable ofevolving beyond a finite set of rules for effectively and conversantlyinteracting with a user. The embodiments of the present applicationdescribed herein provide technical solutions that address, at least, theneed described above, as well as the deficiencies of the state of theart described throughout the present application.

SUMMARY OF THE INVENTION(S)

In one embodiment, a method for automatically detecting annotationdiscrepancies in annotated training data samples and repairing theannotated training data samples for a machine learning-based automateddialogue system includes: evaluating a corpus of a plurality of distincttraining data samples; identifying one or more of a slot span defect anda slot label defect of a target annotated slot span of a target trainingdata sample of the corpus based on the evaluation; and automaticallycorrecting one or more annotations of the target annotated slot spanbased on the identified one or more of the slot span defect and the slotlabel defect.

In one embodiment, automatically correcting the one or more annotationsof the target annotated slot span includes one of: automaticallyexpanding the target annotated slot span to include one or moresurrounding tokens based on identifying the slot span defect; andautomatically contracting the target annotated slot span to exclude oneor more tokens within the annotated slot span based on detecting theslot span defect.

In one embodiment, automatically correcting the one or more annotationsof the target annotated slot span includes one or more of: deleting oneor more erroneous slot labels from the target annotated slot span basedon identifying the slot label defect, and annotating to the targetannotated slot span a proper slot label based on slot labels applied toa majority of like slot spans within the corpus.

In one embodiment, the slot span defect relates to an error or aninconsistency in a setting of a length of a given annotated slot span ofa given training data sample, wherein the length of the given annotatedslot span is set such that the given annotated slot span erroneouslyincludes one or more extraneous tokens or erroneously excludes one ormore pertinent tokens of the given training data sample.

In one embodiment, the slot label defect relates to an error or aninconsistency in an application of one or more distinct slotclassification labels to a given annotated slot span of a given trainingdata sample.

In one embodiment, the method includes constructing a slot span formatevaluator for the target annotated slot span based on span formatattributes of one or more of the plurality of distinct training datasamples of the corpus, wherein constructing the slot span formatevaluator includes: defining a left- and right n-gram set for a slottype of the target annotated slot span, wherein the slot type relates toone of a plurality of distinct slot classification labels.

In one embodiment, a given left n-gram set of the left- and right n-gramset comprises a set of token sequences that excludes a last token in ann-gram, a given right n-gram set of the left- and right n-gram setscomprises a set of token sequences that excludes the first token in then-gram, and the n-gram is a sequence of tokens of a given training datasample within the corpus of annotated training samples.

In one embodiment, evaluating the corpus of the plurality of distincttraining data samples includes iterating the slot format evaluator overeach of the plurality of distinct training data samples within thecorpus, wherein iterating the slot format evaluator includes:identifying whether a segment of a given annotated slot span of a giventraining data sample is positioned adjacent to an un-labeled span of thegiven training data sample having a member token within the left- andright n-gram set for a slot type of the given annotated training datasample; and if the segment of a given annotated slot span is adjacent tothe un-labeled span that includes the member token, identifying themember token as a candidate for inclusion into the given annotated slotspan.

In one embodiment, evaluating the corpus of the plurality of distincttraining data samples includes: implementing a voting stage that informsa candidate exclusion policy or a candidate inclusion policy for themember token, wherein the voting stage includes:

identifying candidate-slot pairs from within the corpus, wherein eachcandidate-slot pair includes a pairing of the member token with anadjacent token as found throughout distinct training data samples withinthe corpus, wherein the voting stage informs the candidate inclusionpolicy if a majority of candidate-slot pairs occur labeled as part of asingle slot span thereby adding the member token to the given annotatedslot span.

In one embodiment, the method includes designating the given annotatedslot span as an archetypal slot span for the slot type of the targetannotated slot span, and identifying whether the target annotated slotspan includes the slot span defect is based on a comparison of thetarget annotated slot span to the archetypal slot span.

In one embodiment, evaluating the corpus of the plurality of distincttraining data samples includes iterating the slot format evaluator overeach of the plurality of distinct training data samples within thecorpus, wherein iterating the slot format evaluator includes:identifying whether a given annotated slot span is positioned adjacentto an un-labeled span not having a member token within the left- andright n-gram set for a slot type of the given annotated slot span; andif the given annotated slot span is adjacent to the un-labeled span thatdoes not include the member token, identifying a token within the givenannotated slot span that is adjacent to the un-labeled slot span as acandidate for exclusion from the given annotated slot span.

In one embodiment, evaluating the corpus of the plurality of distincttraining data samples includes: implementing a voting stage that informsa candidate exclusion policy or a candidate inclusion policy for themember token, wherein the voting stage includes: identifyingcandidate-slot pairs from within the corpus, wherein each candidate-slotpair includes a pairing of the member token with an adjacent token asfound throughout the plurality of distinct training data samples withinthe corpus, wherein the voting stage informs the candidate exclusionpolicy if a majority of candidate-slot pairs do not occur labeled aspart of a single annotated slot span thereby excluding the member tokenfrom the given annotated slot span.

In one embodiment, the method includes designating the given annotatedslot span as an archetypal slot span for the slot type of the targetannotated slot span, and identifying whether the target annotated slotspan includes the slot span defect is based on a comparison of thetarget annotated slot span to the archetypal slot span.

In one embodiment, the method includes implementing a label variationevaluator, wherein the label variation evaluator includes a variationn-gram that identifies whether there is the slot label defect in thegiven annotated training data sample, wherein implementing the labelvariation evaluator includes: (1) setting the given annotated slot spanas a slot nucleus; (2) setting a fixed radius, k, around the nucleus,where k is a predetermined number of tokens; and (3) identify allvariation n-grams for the given training sample based on the slotnucleus and fixed radius, k, wherein each identified variation n-gramincludes an n-gram that includes a combination of the slot nucleus andone or more tokens surrounding the slot nucleus within a given trainingdata sample.

In one embodiment, the method includes implementing a voting stage thatidentifies whether a standard labeling convention for the variationn-gram exists within the corpus of annotated training sample data,wherein the standard labeling convention exists if a majority ofannotated training data samples having the variation n-gram within thecorpus include a distinct slot classification label, wherein if thestandard labeling convention exists, the label variation evaluatordetermines whether the slot label annotated to the given training datasample is defective.

In one embodiment, if the slot label annotated to the given trainingdata sample is defective, automatically repairing an annotation of thegiven annotated slot span to remove the slot label and ascribe a newslot label defined by the standard labeling convention to the givenannotated slot span.

In one embodiment, the slot span defect includes one or more identifiedslot span defect types including one or more of: an omissioninconsistency; and an addition inconsistency.

In one embodiment, automatically correcting the one or more annotationsof the target annotated slot span is further based on the one or moreidentified slot span defect types.

In one embodiment, the slot label defect includes one or more identifiedslot label defect types including one or more of: a wrong labelinconsistency; a swapped label inconsistency; and a chop and joininconsistency.

In one embodiment, automatically correcting the one or more annotationsof the target annotated slot span is further based on the one or moreidentified slot label defect types.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates a schematic representation of a system 100 inaccordance with one or more embodiments of the present application;

FIG. 1A illustrates a schematic representation of a subsystem of system100 in accordance with one or more embodiments of the presentapplication;

FIG. 2 illustrates an example method in accordance with one or moreembodiments of the present application; and

FIG. 3 illustrates a schematic representation of slot format and/or slotlabeling discrepancies in accordance with one or more embodiments of thepresent application.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The following description of the preferred embodiments of the presentapplication are not intended to limit the inventions to these preferredembodiments, but rather to enable any person skilled in the art to makeand use these inventions.

Overview

As discussed above, existing virtual assistant implementations do nothave the requisite flexibility to address unrecognized queries orcommands from user in which there are no predetermined rules designedaround narrowly defined intents. This inflexible structure cannotreasonably and efficiently address the many variances in the manners inwhich a user may pose a query or command to the virtual assistant.

The embodiments of the present application, however, provide anartificially intelligent machine learning-based dialogue service and/orsystem with natural language processing capabilities that function toprocess and comprehend structured and/or unstructured natural languageinput from a user or input from any other suitable source andcorrespondingly provide highly conversant responses to dialogue inputsto the system. Using one or more trained (deep) machine learning models,such as long short-term memory (LSTM) neural network, the embodiments ofthe present application may function to understand any variety ofnatural language utterance or textual input provided to the system. Theone or more deep machine learning models post deployment can continue totrain using unknown and previously incomprehensible queries or commandsfrom users. As a result, the underlying system that implements the(deep) machine learning models may function to evolve with increasinginteractions with users and training rather than being governed by afixed set of predetermined rules for responding to narrowly definedqueries, as may be accomplished in the current state of the art.

Accordingly, the evolving nature of the artificial intelligence platformdescribed herein therefore enables the artificially intelligent virtualassistant latitude to learn without a need for additional programmingand the capabilities to ingest complex (or uncontemplated) utterancesand text input to provide meaningful and accurate responses.

Additionally, systems and methods are provided that enable anintelligent curation of training data for machine learning models thatenable a rapid and efficient training of machine learning modelsemployed in a machine learning-based dialogue system.

Slot-filling Inconsistencies Overview

Data-driven slot-filling models in task-oriented dialog systems rely oncarefully-annotated training data. However, annotations may often beperformed by non-experts such as crowd workers, which can result ininconsistent and erroneous annotations. These issues may be resolved orpartially mitigated by expert manual inspection, but such processes canbe a time consuming and costly. In one or more embodiments of thepresent application, multiple inconsistency types identified inslot-filling annotations are defined. In some embodiments, one or moretechniques for automatically identifying these inconsistenciesintroduced in the present application. In such embodiments, identifyingand fixing such mistakes may lead to better performing slot-fillingmodels downstream and an overall improvement in a quality of utterancedata processing and response construction by a machine learning-baseddialogue system, such as system 100 described in more detail below.

1. System for a Machine Learning-Based Dialogue System

As shown in FIG. 1, a system 100 that automatically sources trainingdata and trains and/or configures machine learning models includes anartificial intelligence (AI) virtual assistant platform 110 (e.g.,artificially intelligent dialogue platform), a machine learningconfiguration interface 120, a training/configuration data repository130, a configuration data queue 135, and a plurality of externaltraining/configuration data sources 140. Additionally, the system 100may include an anomaly detection sub-system 170 that may function toreceive training data samples as input and identify slot format and/orslot label errors and automatically correct the slot format and/or theslot label errors, as shown by way of example in FIG. 1A.

Generally, the system 100 functions to implement the artificialintelligence virtual assistant platform 110 to enable intelligent andconversational responses by an artificially intelligent virtualassistant to a user query and/or user command input into the system 100,as described in U.S. patent application Ser. No. 15/797,414 and U.S.Pat. Nos. 10,572,801, 10,296,848, which are all incorporated herein intheir entireties by this reference. Specifically, the system 100functions to ingest user input in the form of text or speech into a userinterface 160. At natural language processing components of the system100 that may include, at least, the competency classification engine 120the slot identification engine 130, and a slot value extractor 135, thesystem 100 functions to identify a competency classification label forthe user input data and parse the user input data into comprehensibleslots or segments that may, in turn, be converted intoprogram-comprehensible and/or useable features. Leveraging the outputsof the natural language processing components of the system 100, theobservables extractor 140 may function to generate handlers based on theoutcomes of the natural language processing components and further,execute the generated handlers to thereby perform various operationsthat accesses one or more data sources relevant to the query or commandand that also performs one or more operations (e.g., data filtering,data aggregation, and the like) to the data accessed from the one ormore data sources.

The artificial intelligence virtual assistant platform 110 functions toimplement an artificially intelligent virtual assistant capable ofinteracting and communication with a user. The artificial intelligenceplatform 110 may be implemented via one or more specifically configuredweb or private computing servers (or a distributed computing system;e.g., the cloud) or any suitable system for implementing the system 100and/or the method 200.

In some implementations, the artificial intelligence virtual assistantplatform 110 may be a remote platform implemented over the web (e.g.,using web servers) that is configured to interact with distinct anddisparate service providers. In such implementation, an event such as auser attempting to access one or more services or data from one or moredata sources of the service provider may trigger an implementation ofthe artificially intelligent virtual assistant of the AI platform 110.Thus, the AI virtual assistant platform 110 may work in conjunction withthe service provider to attend to the one or more queries and/orcommands of the users of the service provider. In this implementation,the data sources 160 may be data sources of the service provider thatare external data sources to the AI virtual assistant platform 110.

The competency classification engine 120 together with the slotidentification engine 130 and the slot value extractor 135 preferablyfunction to define a natural language processing (NLP) component of theartificial intelligence platform 110. In one implementation, the naturallanguage processing component may additionally include the automaticspeech recognition unit 105.

The competency classification engine 120 functions to implement one ormore competency classification machine learning models to label userinput data comprising a user query or a user command. The one or morecompetency classification machine learning models may include one ormore deep machine learning algorithms (e.g., a recurrent neural network,etc.) that have been specifically trained to identify and/or classify acompetency label for utterance input and/or textual input. The traininginput used in training the one or more deep machine learning algorithmsof the competency classification engine 120 may include crowdsourceddata obtained from one or more disparate user query or user command datasources and/or platforms (e.g., messaging platforms, etc.). However, itshall be noted that the system 100 may obtain training data from anysuitable external data sources. The one or more deep machine learningalgorithms may additionally be continually trained using user queriesand user commands that were miss-predicted or incorrectly analyzed bythe system 100 including the competency classification engine 120.

The competency classification engine 120 may additionally be configuredto generate or identify one competency classification label for eachuser query and/or user command input into the engine 120. The competencyclassification engine 120 may be configured to identify or select from aplurality of predetermined competency classification labels (e.g.,Income, Balance, Spending, Investment, Location, etc.). Each competencyclassification label available to the competency classification engine120 may define a universe of competency-specific functions available tothe system 100 or the artificially intelligent assistant for handling auser query or user command. That is, once a competency classificationlabel is identified for a user query or user command, the system 100 mayuse the competency classification label to restrict one or morecomputer-executable operations (e.g., handlers) and/or filters that maybe used by system components when generating a response to the userquery or user command. The one or more computer-executable operationsand/or filters associated with each of the plurality of competencyclassifications may be different and distinct and thus, may be used toprocess user queries and/or user commands differently as well as used toprocess user data (e.g., transaction data obtained from external datasources 160).

Additionally, the competency classification machine learning model 120may function to implement a single deep machine learning algorithm thathas been trained to identify multiple competency classification labels.Alternatively, the competency classification machine learning model 120may function to implement an ensemble of deep machine learningalgorithms in which each deep machine learning algorithm of the ensemblefunctions to identify a single competency classification label for userinput data. For example, if the competency classification model 120 iscapable of identifying three distinct competency classification labels,such as Income, Balance, and Spending, then the ensemble of deep machinelearning algorithms may include three distinct deep machine learningalgorithms that classify user input data as Income, Balance, andSpending, respectively. While each of the deep machine learningalgorithms that define the ensemble may individually be configured toidentify a specific competency classification label, the combination ofdeep machine learning algorithms may additionally be configured to worktogether to generate individual competency classification labels. Forexample, if the system receives user input data that is determined to behighly complex (e.g., based on a value or computation of the user inputdata exceeding a complexity threshold), the system 100 may function toselectively implement a subset (e.g., three machine learning algorithmsfrom a total of nine machine learning algorithms or the like) of theensemble of machine learning algorithms to generate a competencyclassification label

Additionally, the competency classification engine 120 may beimplemented by the one or more computing servers, computer processors,and the like of the artificial intelligence virtual assistance platform110.

The slot identification engine 130 functions to implement one or moremachine learning models to identify slots or meaningful segments of userqueries or user commands and to assign a slot classification label foreach identified slot. The one or more machine learning modelsimplemented by the slot identification engine 130 may implement one ormore trained deep machine learning algorithms (e.g., recurrent neuralnetworks). The one or more deep machine learning algorithms of the slotidentification engine 130 may be trained in any suitable mannerincluding with sample data of user queries and user commands that havebeen slotted and assigned slot values and/or user system derivedexamples. Alternatively, the slot identification engine 130 may functionto implement an ensemble of deep machine learning algorithms in whicheach deep machine learning algorithm of the ensemble functions toidentify distinct slot labels or slot type labels for user input data.For example, slot identification engine 130 may be capable ofidentifying multiple distinct slot classification labels, such asIncome, Account, and Date labels, then the ensemble of deep machinelearning algorithms may include three distinct deep machine learningalgorithms that function to classify segments or tokens of the userinput data as Income, Account, and Date, respectively.

A slot, as referred to herein, generally relates to a defined segment ofuser input data (e.g., user query or user command), a training datasample, an utterance sample or the like that may include one or moredata elements (e.g., terms, values, characters, media, etc.).Accordingly, the slot identification engine 130 may function todecompose a query or command into defined, essential components thatimplicate meaningful information to be used when generating a responseto the user query or command.

A slot label which may also be referred to herein as a slotclassification label may be generated by the one or more slotclassification deep machine learning models of the engine 130. A slotlabel, as referred to herein, generally relates to one of a plurality ofslot classification labels that generally describes a slot (or the dataelements within the slot) of a user query or user command. The slotlabel may define a universe or set of machine or program-comprehensibleobjects that may be generated for the data elements within an identifiedslot.

Like the competency classification engine 120, the slot identificationengine 130 may implement a single deep machine learning algorithm or anensemble of deep machine learning algorithms. It shall be recognizedthat the slot identification engine may function to implement anysuitable machine learning model and/or algorithm, including, but notlimited to, a conditional random field (CRF) for performing slot-fillingtasks and the like. Additionally, the slot identification engine 130 maybe implemented by the one or more computing servers, computerprocessors, and the like of the artificial intelligence virtualassistance platform 110.

The machine learning models and/or the ensemble of machine learningmodels may employ any suitable machine learning including one or moreof: supervised learning (e.g., using logistic regression, using backpropagation neural networks, using random forests, decision trees,etc.), unsupervised learning (e.g., using an Apriori algorithm, usingK-means clustering), semi-supervised learning, reinforcement learning(e.g., using a Q-learning algorithm, using temporal differencelearning), and any other suitable learning style. Each module of theplurality can implement any one or more of: a regression algorithm(e.g., ordinary least squares, logistic regression, stepwise regression,multivariate adaptive regression splines, locally estimated scatterplotsmoothing, etc.), an instance-based method (e.g., k-nearest neighbor,learning vector quantization, self-organizing map, etc.), aregularization method (e.g., ridge regression, least absolute shrinkageand selection operator, elastic net, etc.), a decision tree learningmethod (e.g., classification and regression tree, iterative dichotomizer3, C4.5, chi-squared automatic interaction detection, decision stump,random forest, multivariate adaptive regression splines, gradientboosting machines, etc.), a Bayesian method (e.g., naïve Bayes, averagedone-dependence estimators, Bayesian belief network, etc.), a kernelmethod (e.g., a support vector machine, a radial basis function, alinear discriminate analysis, etc.), a clustering method (e.g., k-meansclustering, expectation maximization, etc.), an associated rule learningalgorithm (e.g., an Apriori algorithm, an Eclat algorithm, etc.), anartificial neural network model (e.g., a Perceptron method, aback-propagation method, a Hopfield network method, a self-organizingmap method, a learning vector quantization method, etc.), a deeplearning algorithm (e.g., a restricted Boltzmann machine, a deep beliefnetwork method, a convolution network method, a stacked auto-encodermethod, etc.), a dimensionality reduction method (e.g., principalcomponent analysis, partial least squares regression, Sammon mapping,multidimensional scaling, projection pursuit, etc.), an ensemble method(e.g., boosting, bootstrapped aggregation, AdaBoost, stackedgeneralization, gradient boosting machine method, random forest method,etc.), and any suitable form of machine learning algorithm. Eachprocessing portion of the system 100 can additionally or alternativelyleverage: a probabilistic module, heuristic module, deterministicmodule, or any other suitable module leveraging any other suitablecomputation method, machine learning method or combination thereof.However, any suitable machine learning approach can otherwise beincorporated in the system 100. Further, any suitable model (e.g.,machine learning, non-machine learning, etc.) can be used inimplementing the artificially intelligent virtual assistant and/or othercomponents of the system 100.

The slot value extraction unit 135 functions to generate slot values byextracting each identified slot and assigned slot label of the userquery or user command and converting the data elements (i.e., slot data)within the slot to a machine or program-comprehensible object orinstance (e.g., term or value); that is, the slot label is mapped tocoding or data that a computer or program of the system 100 comprehendsand is able to manipulate or execute processes on. Accordingly, usingthe slot label generated by the slot identification engine 130, the slotextraction unit 135 identifies a set or group of machine orprogram-comprehensible objects or instances that may be applied to slotdata of a slot assigned with the slot label. Thus, the slot extractionunit 135 may convert the slot data of a slot to a machine orprogram-comprehensible object (e.g., slot values) based on the slotlabel and specifically, based on the available objects, instances, orvalues mapped to or made available under the slot label.

The observables extractor 140 functions to use the slot valuescomprising the one or more program-comprehensible objects generated atslot extraction unit 135 to determine or generate one or more handlersor subroutines for handling the data of or responding to the user queryor user command of user input data. The observables extractor 140 mayfunction to use the slot values provided by the slot extraction unit 135to determine one or more data sources relevant to and for addressing theuser query or the user command and determine one or more filters andfunctions or operations to apply to data accessed or collected from theone or more identified data sources. Thus, the coding or mapping of theslot data, performed by slot extraction unit 135, toprogram-comprehensible objects or values may be used to specificallyidentify the data sources and/or the one or more filters and operationsfor processing the data collected from the data sources.

The response generator 150 functions to use the competencyclassification label of the user input data to identify or select onepredetermined response template or one of a plurality of predeterminedresponse templates. For each competency classification label of thesystem 100, the system 100 may have stored a plurality of responsetemplates that may be selected by the response generator 150 based on anidentified competency classification label for user input data.Additionally, or alternatively, the response template may be selectedbased on both the competency classification label and one or moregenerated slot values. In such instance, the one or more slot values mayfunction to narrow the pool of response template selectable by theresponse generator to a subset of a larger pool of response templates toconsider the variations in a query or user command identified in theslot values. The response templates may generally a combination ofpredetermined output language or text and one or more input slots forinterleaving the handler outputs determined by the observables extractor140.

The user interface system 105 may include any type of device orcombination of devices capable of receiving user input data andpresenting a response to the user input data from the artificiallyintelligent virtual assistant. In some embodiments, the user interfacesystem 105 receives user input data in the form of a verbal utteranceand passes the utterance to the automatic speech recognition unit 115 toconvert the utterance into text. The user interface system 105 mayinclude, but are not limited to, mobile computing devices (e.g., mobilephones, tablets, etc.) having a client application of the system 100,desktop computers or laptops implementing a web browser, an automatedteller machine, virtual and/or personal assistant devices (e.g., Alexa,Google Home, Cortana, Jarvis, etc.), chatbots or workbots, etc. Anintelligent personal assistant device (e.g., Alexa, etc.) may be anytype of device capable of touchless interaction with a user toperforming one or more tasks or operations including providing data orinformation and/or controlling one or more other devices (e.g.,computers, other user interfaces, etc.). Thus, an intelligent personalassistant may be used by a user to perform any portions of the methodsdescribed herein, including the steps and processes of method 200,described below. Additionally, a chatbot or a workbot may include anytype of program (e.g., slack bot, etc.) implemented by one or moredevices that may be used to interact with a user using any type of inputmethod (e.g., verbally, textually, etc.). The chatbot or workbot may beembedded or otherwise placed in operable communication and/or control ofa communication node and thus, capable of performing any process or taskincluding, but not limited to, acquiring and providing information andperforming one or more control operations.

2. Method for Automatically Detecting and Repairing Slot Errors &Inconsistencies in Training Data

FIG. 2 illustrates an exemplary method 200 for detecting annotationdiscrepancies in training data for automatically categorizing and/orrecategorizing training data for use within and improving amachine-learned based dialogue system. The method 200, in someembodiments, includes optionally gathering and compiling annotatedand/or labeled data S210, identifying inconsistent or erroneous slotformat S220, deriving or building a slot span format evaluator S230,identifying one or more annotation errors or inconsistencies in one ormore slot labels S240, implementing a variation n-gram method S250.Optionally, the method 200 may optionally include synthesizing anapplication of the slot format evaluator and the slot label variationevaluator S260.

In one or more embodiments of the present application, the method 200preferably enables an intelligent, automatic correction of labeled orannotated data that may be subject to multiple types or categories oferrors, inconsistencies or misclassifications. The method 200 mayfunction to automatically characterize an archetypal or prototypicalslot pattern that may incorporate one or more of a slot's format,content, length in number of words or tokens, type of token, or anyother suitable input or parameter. Such a characterization may functionto check and/or validate slot annotations within a dataset to ensureconsistency and correctness of the labeled slots.

Accordingly, the method 200 functions to implement a procedure toincrease the ease and computational efficiency of processes that mayinvolve correcting errors in annotation or labelling. Additionally, themethod 200 may function to increase the performance of a machinelearning model as measured by accuracy, precision, recall, F1 score, orany other suitable metric by increasing the fidelity of the labels orannotations of the training data corpus or corpora.

2.1 Sourcing and/or Gathering Training Sample Data

Optionally, or alternatively, S210, which includes sourcing machinelearning training data from one or more training data sources, mayfunction to collect machine learning training data from one or more of aplurality of internal and/or external (e.g., third-party) sources oftraining data. In some embodiments, the method 200 may function tosource and/or collect training data implementing the methods and/orsystems described in U.S. Pat. No. 10,296,848, which is incorporated inits entirety by this reference.

In a preferred embodiment, the machine learning training data from anexternal machine learning training data source comprises a plurality oflabeled training samples proliferated based on or using the input ofseed machine learning data samples. Accordingly, the machine learningtraining data returned from the external machine learning training datasource may include a large number (e.g., hundreds, thousands, millions,etc.) of labeled data samples that are variants of the seed machinelearning data samples. That is, the labeled data samples returned by theexternal or internal training data source may have the same or similarmeanings to one or more of the example user queries, example utterances,and/or one or more examples user prompts.

Additionally, S210 preferably functions to source the machine learningtraining data from the external or internal training data sources andthe internal training data sources, in parallel. That is, S210 mayfunction to collect machine learning training data from each of theplurality of external or internal training data sources and possibly oneor more internal training data sources at the same time without waitingfor any one external or internal training data source to provide acompleted corpus of training data samples.

Additionally, or alternatively, S210 may function to store the collectedmachine learning training data from each of a plurality of external orinternal machine learning training data sources in disparate datastores.That is, S210 may configure a distinct and separate datastore forreceiving and storing machine learning training data for each of theplurality of external or internal machine learning training datasources. In this way, specific processing of the machine learningtraining data may be performed on a per training data source basis.

Additionally, or alternatively, S210 may function to store the collectedmachine learning training data from the plurality of external orinternal machine learning training data sources in a single datastore.In some embodiments, all machine learning training data may be mixedtogether or combined. Alternatively, S210 may function to augment themachine learning training data with metadata that identifies from whichexternal or internal machine learning training data source that alabeled data sample originated from.

Additionally, or alternatively, S210 may function to store the collectedmachine learning training data in one or more training data queues. Theone or more training data queues may function to store the collectedmachine learning training data for a predefined period. In someembodiments, unless one or more machine learning training data samplesare pruned or extracted from the one or more training data queues, S210may function to automatically load the training data in the one or moretraining data queues directly into a corresponding or assigned machinelearning model. That is, the training data in the queues may be used bythe live machine learning system to generate one or more live machinelearning classification labels or the like. The predefined period may beset to any suitable period that preferably enables an opportunity for aprocessing system to evaluate and refine the training data samples fromthe external or internal training data sources.

Additionally, or alternatively, S210 may implement one or morethresholds for each of the plurality of external or internal trainingdata sources that may function to limit an amount of training data thatmay be collected from each of the plurality of external or internaltraining data sources. Once S210 detects that a limit or threshold ismet for a specific external or internal training data source, S210 maycease collecting or accepting training data from the specific externalor internal training data source and may further, signal the specificexternal or internal training data source to stop transmitting machinelearning training data to the machine learning-based dialogue service.

The limits or thresholds for each of the plurality of external orinternal training data sources may be preset (e.g., may be an inputvalue at the configuration console) or dynamic and may be different foreach of the plurality of external or internal training data sources. Forinstance, a training data limit or training data threshold for each ofthe plurality of external or internal training data sources may be setbased on a calculated level of quality assigned to each of the pluralityof external or internal training data sources. The level of qualitypreferably relates to an accuracy of labels generated by the external orinternal training data source for each labeled training data sampleprovided thereby. Thus, a higher calculated level of quality of trainingdata for a given external or internal training data source may enable ahigher limit or threshold for receiving labeled training data samples.For instance, a first external or internal training data source may havea high level of quality (judged based on a scale of 0-10, e.g., 8 levelof quality or the like) and thus, assigned a high threshold (e.g., 1000samples or the like). A second external or internal training data sourcemay have a low level of quality (e.g., 2 level of quality, etc.) andthus, assigned a low threshold (e.g., 100 samples or the like).

Additionally, or alternatively, a new observation preferably relates adata set or a piece of data on which a machine learning model has notpreviously been trained.

In some embodiments, a source of the new observation data in S210 mayinclude proprietary data collected from deployed (i.e., in productionsystems or production logs) virtual agents in a production environment.For example, data from a system in use may be recorded and used for thispurpose.

In another implementation, in the absence of a deployed virtual agent,or in the interest of broadening a training sample dataset, data may becollected by means of requesting and recording new relevant interactionsspecific to the domain and type of training data desired for trainingone or more models. This could include hiring agents to perform novelinteractions, using pre-existing datasets (e.g., movie scripts, etc.) orany other method of generating new data.

Additionally, or alternatively, training data may be required to belabeled and/or annotated in order to be used in the training process. Inone implementation, data may be labeled and/or annotated by humanstrained to perform such a task, in a centralized location such as adatacenter with on-site workers, or in a distributed manner such ascrowdsourcing, e.g. leveraging a platform such as Amazon Turk.

2.2 Slot Annotation Error or Inconsistency Identification

S220, which includes detecting or identifying inconsistent or erroneousslot format (e.g., a slot segment defect, etc.), may function toidentify and/or correct slot format inconsistencies of one or moretraining samples. That is, S220 may function to automatically determinewhether one or more structures of one or more slot types of a trainingsample may not be correctly annotated. In one or more embodiments, S220may include evaluating each of a plurality of distinct training samplesof a corpus of raw machine learning training data for slot formatcompliance and/or discrepancies. As referred to herein, raw machinelearning training data preferably relates to machine learning trainingdata that may not have been evaluated, processed, and/or approved to beused in a formal training process flow for a deployable machine learningmodel.

2.2.1 Slot Format Errors or Inconsistencies

S220, which may include identifying one or more annotationinconsistencies or errors within the format of a slot span of a giventraining sample, may function to evaluate each distinct slot span of agiven training sample and identify whether the slot span includes one ormore slot annotation errors or slot annotation inconsistencies. Thus, ifa given training sample, includes multiple annotated slot spans, S221may function to individually evaluate each of the multiple annotatedslot spans (possibly in parallel) of the training sample for slotspan-related errors or inconsistencies.

A. Slot Format Inconsistency

In some embodiments, a first slot span discrepancy type may include aslot format error and/or a slot format inconsistency. A slot formaterror or inconsistency may relate to instances in which certain tokensof a given training sample being inconsistently or incorrectly includedor excluded in an annotated slot span. That is, slot format errors orinconsistencies may occur when the structure or a category (e.g., labelor annotation) of a slot span may not be applied or may not be annotatedin a consistent manner relative to a plurality of or a majority of otherslots within a given corpus of training data or relative to apredetermined standard for slot span annotation. In a non-limitingexample, in an attempt to properly label a slot span with an ACCOUNTslot label of a given training sample, such as “to my checking accountplease”, it is possible that multiple distinct annotations to the tokensof this training sample may arise. For instance, it may be possible thatthe ACCOUNT slot classification label is applied only to the slot spanof “my checking account” having three potential tokens while a differentannotation of a slot span of the same or similar training sampleannotates the slot span having only two tokens “checking account” withthe ACCOUNT slot classification label and yet another annotates thetoken span having only one token of “checking” of the training samplewith the ACCOUNT slot classification label rendering these annotationsusing the same slot classification label of these distinct slot spansinconsistent. That is, the application of the slot classification labelof ACCOUNT may be the proper slot span category or slot classificationlabel, however, the slot classification label is applied variably todifferent slot span lengths.

Accordingly, in some embodiments, the slot span inconsistency or slotspan error may be that a given slot labeling of a slot span excludes oneor more pertinent tokens (i.e., a token that should be included in aproper annotation of a slot span) and/or that the given slot labeling ofthe slot span includes one or more extraneous tokens (i.e., a token thatshould not be included in a proper annotation of slot span).

Depending on the nature of the downstream task for which slot-filling isbeing used, the presence or absence of certain tokens in an extractedslot may affect overall dialog system performance.

B. Chop and Join Error

In one or more embodiments, a second and third slot span discrepancytype may include chop, and join errors or inconsistencies, respectively.A chop inconsistency or error may relate to a token span being labeledas multiple annotations when a single annotation is appropriate orpreferred according to a predetermined slot span labeling technique orthe like. A join inconsistency or error may relate to a token span beinglabeled as one continuous slot annotation when in fact, severalannotations are appropriate or preferred. That is, chop and join mayoccur when a particular slot type spans multiple tokens withoutconsistency about whether a single or multiple labels are applied to thesub spans (e.g. “Canadian dollars” vs. “Mexican” “pesos”).

Accordingly, as evaluated, S221 may identify that a given trainingsample includes one or more extraneous slot labels to one or more slotspans of the given training sample and/or that the given training samplemay be missing one or more pertinent slot labels to one or more slotspans of the given training sample. In one or more embodiments, anextraneous nature and/or a pertinent nature of one or more distinct slotclassification labels may be determined based on a predetermined slotlabeling technique or scheme, which may be defined by the machinelearning-based dialogue service or the like. In such embodiments, thepredetermined slot labeling technique may include instructions forsetting or configuring proper slot span lengths and associatedinstructions for setting or applying one or more distinct slotclassification labels to given slot spans.

2.3 Slot Format Evaluator

S230, which includes deriving or building a slot span format evaluator(checker), may function to constructs one or more archetypal orprototypical slot span formats for one or more distinct slot spans of acorpus of training data and/or builds a slot span algorithm that detectsone or more slot format inconsistencies or errors within trainingsamples of a corpus of training data. In one or more embodiments, a slotspan format evaluator may be training data corpus-specific, in that, foreach distinct corpus of training data, a distinct slot span formatevaluator may be derived and applied against the training samples of thedistinct corpus. In such embodiments, in deriving a slot span evaluator,the most common or prevalent slot span formats within a distinct corpusof training data may be used to define or inform a design orarchitecture of a slot span evaluator for the distinct corpus.

In one or more embodiments, S230 may function to evaluate each annotatedtraining sample of a corpus or a corpora of annotated training data andattempt to find scenarios in the corpus or the corpora where a labeledslot span exists next to an unlabeled token or token span. In oneembodiment, S230 may function to identify instances in the corpus ofannotated training sample data wherein a token or token span may beinconsistently or erroneously annotated as belonging to a slot span,defined as extraneous tokens or extraneous token spans. In anotherembodiment, S230 may function to separately or in addition to identifyinstances in the corpus of annotated training sample data wherein atoken or token span may be inconsistently or erroneously annotated asnot belonging to a slot span, defined as pertinent tokens or pertinenttoken spans.

It shall be noted that the slot format evaluator in S230 may beimplemented in or as a part of the identification of the slot formaterror or inconsistency in S220.

In a preferred embodiment, S230 may function to implement a votingscheme that may function to form a policy suggestion to either remove anextraneous token or extraneous token span or add a pertinent token orpertinent token span from or to a labeled slot span, respectively. Thatis, in such embodiments, a candidate token (i.e., a token underevaluation) that is adjacent to a labeled slot span, but currentlyexcluded, may be added to the labeled slot span if S230 determines thatthe majority of candidate-slot pairs of a slot-type within a givencorpus occur labeled as part of the slot span in the dataset.Conversely, in some embodiments, a candidate token that may bepositioned at a beginning or at an end of a target labeled slot span maybe removed or extracted from the labeled slot span if S230 (via the slotspan format evaluator) determines that the majority or plurality ofcandidate slot pairs of distinct slot-type within a given corpus do notoccur as a labeled part of the slot spans of the slot-type within thecorpus. Accordingly, S230 may function to define a new archetypal slotspan or prototypical slot span to expand an annotation of the slot spanin a direction towards the pertinent token to properly include thepertinent token that was erroneously excluded from the labeled slot spanor contract an annotation of the slot span in a direction that excludes(i.e., towards a middle or body of the slot span to properly exclude theextraneous token that was erroneously included within the labeled slotspan.

Conversely, if a candidate token that is included at either end of alabeled slot span may be removed from the labeled slot span if S230determines that the majority of candidate-slot pairs of a slot-typewithin a given corpus may not occur as a labeled part of the slot spanin the given corpus. Accordingly, S230 may function to contract anannotation to the slot span in a direction of the extraneous token toproperly exclude the extraneous token that was erroneously included inthe label slot span.

In a preferred embodiment, the slot span format evaluator may includethe use of an n-gram structure for evaluating tokens within trainingsamples and identifying one or more tokens in a given training sample aseither a candidate for inclusion or exclusion from a given slot span. Asmentioned above, a voting scheme or token inclusion/exclusion techniquemay be implemented to further determine whether a candidate token may beformally removed or formally added to a given slot span of a trainingsample. Specifically, in such embodiments, S230 may function to buildspecific types of n-grams for evaluating a given slot span of a trainingsample, defined as left- and right- n-gram sets. An n-gram preferablyrelates to a series of a finite number (n) of tokens from a givensample. A left n-gram set preferably relates to a set of token sequencesthat may exclude the last token in the n-gram. For example, a valid leftn-gram set of the n-gram “my premier savings account” is {“my”,“premier”, “savings”, “my premier”, “premier savings”, “my premiersavings”}. Similarly, a right n-gram set preferably relates to the setof token sequences that exclude the first token in the n-gram.

Additionally, or alternatively, S230 may include combining a left n-gramset and a right n-gram set into a left- and right n-gram sets therebydefining a comprehensive slot span evaluator for a given corpus that mayfunction to evaluate in parallel a slot span format along a left and aright of slot span in parallel.

In a preferred embodiment, the slot format checker may function toautomatically construct left- and right n-gram sets for each potentialslot type, the elements of which may serve as a lexicon upon which toidentify the archetypal slot formats to identify extraneous and/orpertinent tokens and token spans.

In a preferred embodiment, the slot format checker may be applied withboth left and right n-gram sets individually to each slot type in thecorpus, yielding policy suggestions that attempt to maximize slot formatconsistency within each slot type in the corpus.

2.4 Slot Label Errors or Inconsistencies

S240, which includes identifying one or more annotation errors orinconsistencies in one or more slot labels of a given training sample,may function to evaluate each distinct slot label of the training sampleand identify whether the one or more slot labels annotated to one ormore slot spans of the training sample include slot label errors or slotlabel inconsistencies.

A. Omission Inconsistencies

In some embodiments, a first type of slot label inconsistency and/orerror may include an Omission Inconsistency or an Omission Error, whichmay occur when a token or span of tokens should be labeled with acertain slot label, but is not. That is, Omission Inconsistencies mayoccur when an unlabeled span should be labeled as a slot, as shown byway of example in FIG. 3. For example, if a token “checking account”should be labeled as SOURCE slot, but it is not may amount to anOmission Inconsistency or Omission Error. While in practice it may bechallenging to identify omissions, in one implementation an OmissionInconsistency may be identified for an unannotated span of tokens if thepreponderance or a threshold amount of the annotated training sampledata of a given corpus of training samples indicate that the span underevaluation is labeled with a certain slot label.

B. Addition Inconsistencies

In one or more embodiments, a second type of slot label inconsistencyand/or error may include an Addition Inconsistency, which may occur whena token or span of tokens is labeled with a slot label but should beunlabeled, as shown by way of example in FIG. 3. That is, AdditionInconsistencies may occur when a labeled span should not be labeledthereby defining an extraneous slot label. For example, in the phrase“please start checking my savings account”, if the token “checking” islabeled as an ACCOUNT, this may be considered an erroneous annotation,as “checking” in this instance does not refer to a type of financialaccount but possibly a requested action.

In one or more embodiments, the Addition Inconsistency may be equallyapplicable to one or more tokens that may be erroneously added to atarget annotated slot span and therefore, should be removed.

C. Wrong Label Inconsistencies

In some embodiments, a third type of slot label inconsistency and/orerror may include a “Wrong Label” Inconsistency, which may occur when atoken span is annotated with the wrong slot label, as shown by way ofexample in FIG. 3. That is, Wrong Label Inconsistencies occur when atoken span is labeled as a one slot type but should be labeled asanother slot type. For a given corpus of raw training samples of data, aplurality of distinct slot labels may be applicable to a plurality ofdistinct slot types within the given corpus. Mislabeling of a slot spanmay occur because of confusion in a use of a token within a trainingsample, a misunderstanding of a context surrounding a slot span, and thelike. For example, in the phrase “transfer 30 dollars please” if thetoken span “30 dollars” is labeled as an ACCOUNT slot, but this is notaccurate because the token span “30 dollars” may not accurately reflector inform a financial account. In a preferred embodiment, the WrongLabel Inconsistency may be considered distinct from the OmissionInconsistency in that a label may not be missing in the case of thewrong label.

D. Swapped Label Inconsistencies

In one or more embodiments, a sub-type or special case of the WrongLabel Inconsistency may include a Swapped Label Inconsistency, which mayoccur when two token spans in a training sample have wrong labels for agiven annotation scheme, and where the wrong label of one of the twotoken spans is the correct label for the other, as shown by way ofexample in FIG. 3. That is, swapped labels inconsistency may occur whenthe labels of a pair of annotated slots ought to be swapped, such that afirst slot span of the pair may be augmented with the annotations of thesecond slot span of the pair and similarly, the second slot span of thepair may be augmented with annotations of the first slot span, whicheffectively swaps the label to be in a correct position along themultiple distinct slot spans of a training sample. Swapped labels mayfrequently occur (but are not limited to occurring) when two slot typesshare the same entity domain (e.g. to- and from airports are bothairports). For example, in the phase “show me flights from New York toBerlin”, if the token span “New York” is labeled as the TO-AIRPORT, and“Berlin” as the FROM-AIRPORT slot, this could be considered a SwappedLabel Inconsistency.

2.5 Slot Label Variation Evaluator

S250, which includes implementing a variation n-gram method, mayfunction to set a variation n-gram algorithm that may function toevaluate one or more slot labels annotated to one or more slot spans ofa given training sample in order to determine if any of the above slotlabel variations may be present. In a preferred embodiment, S224implements a variation n-gram method to find all n-grams of a fixedradius of k tokens centered around an annotated slot, which is sometimesmaybe referred to herein as a slot nucleus. In such embodiments, then-gram centered around the slot nucleus may be referred to as avariation n-gram. Where k relates to a number of tokens. In a secondembodiment, S250 may function to consider a dynamic radius in thecircumstances in which the slot nucleus may be surrounded with moretokens on one side than the other.

It shall be noted that the slot label evaluator in S250 may beimplemented in or as a part of the identification of the slot labelerror or inconsistency in S230.

In one or more embodiments, in training samples wherein the slot labels(or an absence of a slot label) attached to a designated slot nuclei ofthe training sample differ from a majority may be marked as inconsistentor erroneous.

Additionally, or alternatively, S250, which includes implementing avariation n-gram, may function for a given training sample [1] set aslot nucleus (in some embodiments, S226 may function to sequentiallymove along the training sample to reset to a new slot nucleus), [2] seta fixed radius k of tokens from the slot nucleus, [3] identify allvariation n-grams (i.e., sentence fragments) [4] identify a consistentlabeling convention from a majority vote among samples in a subjectdataset, and [5] evaluating non-annotations and annotations agreementsbetween sentence fragments defined by the variation n-gram. That is, ifthe voting stage determines a convention, samples with annotations thatdo not match that convention are flagged as being inconsistently orerroneously annotated in item [5].

2.6 Error Correction Synthesis

Optionally, or alternatively, S260 may function to combine, aggregate,or synthesize an application of the slot format evaluator and the slotlabel variation evaluator together against one or more training samplesof a given corpus of raw training data. In a preferred implementation,multiple methods of error correction synthesis or aggregation may beused in some combination.

In one embodiment, S260 may function to perform the error correctionsteps for the slot format evaluator and the slot label variation inseries (i.e. one at a time), in a specified order, over an entire datacorpus.

In another embodiment, S260 may function to partition the dataset intotwo or more portions, which may include “train” and “evaluate” sets orsplits using any suitable algorithm including, but not limited to, a CRFalgorithm, long-short-term memory (LSTM) algorithm, and/or the like. Insome embodiments, a conditional random field (CRF) may be trained on thetrain set, and subsequently tested on the evaluate set. Samples wherethe CRF predicts different output than the annotated labels may bemarked as possible inconsistencies. S260 may function to perform thetrain and evaluate split procedure is repeated until all samples havebeen evaluated.

In another embodiment, S260 may function to apply a combined evaluatorin parallel such that both the slot format and the slot labels of agiven training sample may be evaluated at the same time.

In another embodiment, S260 may function to apply a combined evaluatorin an intelligent sequence such that a first evaluator of the combinedpair may be applied against a given training sample that may inform anintelligent application of a second evaluator of the combined pair.

The system and methods of the preferred embodiment and variationsthereof can be embodied and/or implemented at least in part as a machineconfigured to receive a computer-readable medium storingcomputer-readable instructions. The instructions are preferably executedby computer-executable components preferably integrated with the systemand one or more portions of the processors and/or the controllers. Thecomputer-readable medium can be stored on any suitable computer-readablemedia such as RAMs, ROMs, flash memory, EEPROMs, optical devices (CD orDVD), hard drives, floppy drives, or any suitable device. Thecomputer-executable component is preferably a general or applicationspecific processor, but any suitable dedicated hardware orhardware/firmware combination device can alternatively or additionallyexecute the instructions.

Although omitted for conciseness, the preferred embodiments includeevery combination and permutation of the implementations of the systemsand methods described herein.

As a person skilled in the art will recognize from the previous detaileddescription and from the figures and claims, modifications and changescan be made to the preferred embodiments of the invention withoutdeparting from the scope of this invention defined in the followingclaims.

What is claimed:
 1. A method for automatically detecting annotationdiscrepancies in annotated training data samples and repairing theannotated training data samples for a machine learning-based automateddialogue system, the method comprising: evaluating a corpus of aplurality of distinct training data samples; identifying one or more ofa slot span defect and a slot label defect of a target annotated slotspan of a target training data sample of the corpus based on theevaluation, wherein identifying whether a target annotated slot of atraining data sample deviates from a left- and right n-gram set createdfor the corpus; and automatically correcting one or more annotations ofthe target annotated slot span based on the identified one or more ofthe slot span defect and the slot label defect.
 2. A system forautomatically detecting annotation discrepancies in annotated trainingdata samples and repairing the annotated training data samples for amachine learning-based automated dialogue system, the system comprising:a label variation evaluator or a slot label evaluator implemented by oneor more computers that: evaluate a corpus of a plurality of distincttraining data samples; identify one or more of a slot span defect and aslot label defect of a target annotated slot span of a target trainingdata sample of the corpus based on the evaluation, wherein identifyingwhether a target annotated slot of a training data sample deviates froma left- and right n-gram set created for the corpus; and automaticallycorrect one or more annotations of the target annotated slot span basedon the identified one or more of the slot span defect and the slot labeldefect.